Pig vs Hive: Benchmarking High Level Query Languages

نویسندگان

  • Benjamin Jakobus
  • Peter McBrien
چکیده

This article presents benchmarking results of two benchmarking sets (run on small clusters of 6 and 9 nodes) applied to Hive and Pig running on Hadoop 0.14.1. The first set of results were obtainted by replicating the Apache Pig benchmark published by the Apache Foundation on 11/07/07 (which served as a baseline to compare major Pig Latin releases). The second results were obtained by applying the TPC-H benchmarks. The two benchmarks showed conflicting results; the first benchmark indicated that Pig outperformed Hive on most operations. However interestingly, TPC-H results provide evidence that Hive is significantly faster than Pig. The article analyzes the two benchmarks, concluding with a set of differences and justification of the results. The article presumes that the reader has a basic knowledge about Hadoop and big data. (The article is not intended as an introduction to Hadoop, Pig or Hive). Which stem from 2013 when the author spent a year at Imperial College London

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing High Level MapReduce Query Languages

The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been constructed on top of the Hadoop MapReduce realization, primarily Pig, Hive, and JAQL. This paper makes a systematic performance comparison of these three HLQLs, focusing on scale up, scale out and runtime metrics. We further make a language comparison of the HLQLs fo...

متن کامل

ReStore: Reusing Results of MapReduce Jobs

Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Users of MapReduce often have analysis tasks that are too complex to express as individual MapReduce jobs. Instead, they use high-level query lang...

متن کامل

A Comparison of Hadoop Tools for Analyzing Tabular Data

The paper describes the application of Hadoop modules: MapReduce, Pig and Hive, for processing and analyzing large amounts of tabular data acquired from a computer simulation of heat transfer in bio tissues. The Apache Hadoop is an open source environment for storing and analyzing BigData. It was installed on a cluster of six computing nodes, each with four cores. The implemented MapReduce job ...

متن کامل

On Teaching Big Data Query Languages

Big data computing systems (e.g., Hadoop) have recently seen tremendous intake as computing platforms for data-intensive applications. The emergence of such big data computing systems has triggered a plenty of new techniques for data management. For example, several new query paradigms have been introduced including map-reduce, HiveQL, Impala, Pig Latin, and Spark. In order to cope with this bi...

متن کامل

Gumbo: Guarded Fragment Queries over Big Data

We present Gumbo, a system for the efficient evaluation of guarded fragment queries on top of Hadoop and Spark. A key asset of Gumbo is the reduced number of jobs in comparison with recent systems such as Pig, Hive or Shark. For unnested guarded fragment queries, Gumbo even provides a constant bound on the number of jobs independent of the size of the query. In the demo, we will address the fol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014